Goto

Collaborating Authors

 neural tangent kernel perspective


Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing Systems

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown.


Review for NeurIPS paper: Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing Systems

Additional Feedback: ### On my overall decision I am willing to largely upgrade my decision, if the authors can provide strong evidence that's easy to check (i.e. "safety checks") to support the correctness of their propositions/theorems. But since the size m of the hidden layers becomes infinite, the set of weights tends to a fixed limiting distribution: the same for all layers. Therefore, when m goes to infinity, the time-varying component gets smoothed out. So, when L now becomes infinite, we exactly recover an unrolled, 1-layer recurrent neural network. By Representer theorem - By the representer theorem - Fig.2, caption: CIFAR102 - CIFAR2 Reply to author response ----------------------------- Thank you for the additional plots provided in your response, which indeed nicely confirm your main theorems.


Review for NeurIPS paper: Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing Systems

After the thorough discussion among the reviewers, there is a consensus that this is a good paper that warrants acceptance. There were some skepticisms in the initial reviews, but the authors have provided a rebuttal which addressed most of the major concerns. The reviewers have updated their reviews/scores accordingly. Hence, the paper is accepted as a poster. Based on my own judgement, the presentation of this paper should be improved in the camera-ready version.


Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? --- A Neural Tangent Kernel Perspective

Neural Information Processing Systems

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities.